状態予測誤差(state prediction error) - 183Lab

状態予測誤差(state prediction error)

↔︎予測報酬誤差(Reward-Prediction Error; RPE)